Dimodal Excursion and Permutation Tests: Feature significance test using permutation or bootstrap (excursion) sampling.

Description

Repeatedly draw samples from a base set of differential values, with or without replacement, and regenerate the signal, measuring its height or range. Compare the feature height against this distribution to estimate its probability.

Usage

Diexcurht.test(ht, ndraw, xbase, nexcur, is.peak, lower.tail=TRUE, seed=0)
Dipermht.test(ht, xbase, nperm, lower.tail=TRUE, seed=0)

Value

Both functions return a list of class "Ditest" with elements

method: a string describing the test
statfn: function used to evaluate significance level/probability
statistic: what is tested, the height of the feature
statname: text string describing the statistic
parameter: other arguments provided to the function, named as such
p.value: probability of feature
alternative: a string describing the direction of the test vs. the null distribution
critval: critical values of the height, at quantiles 0.005, 0.01, 0.05, and 0.10, or 1-these if lower.tail is FALSE
xbase: a copy of the argument

statistic and p.value will have the length of the ht

argument. NA or 0 heights or draws will generate NA for the probabilities.

Arguments

ht: one or more feature heights
ndraw: the size of the feature, a vector as long as ht or a single value
xbase: a vector of values to draw from, differential values that can be summed to reconstruct the signal
nexcur: the number of times to repeat the draw
nperm: the number of times to permute the draw
is.peak: a boolean, TRUE to calculate height for peaks, FALSE for flats
lower.tail: a boolean, TRUE to count the number of simulated heights greater than ht, FALSE the number less than or equal to
seed: if positive, value to set the RNG seed before any sampling

Details

The excursion test determines the distribution of a feature's height by sampling from a set of differential data and doing a cumulative sum. The reconstructed feature height for a peak is the maximum sum above the lower of the first or last point, or the total range for flats, using is.peak to choose. As a bootstrap excursion this can be used with the difference of the low-pass or interval spacing for either peaks or flats. The resolution of the probability returned from the test is 1/nexcur.

The permutation test shuffles the values in xbase before re-summing them to simulate the signal. The test automatically uses the peak height definition. The base set should be the length of the runs within the feature, positive or negative or zero (ignoring length). If generated using rle, multiply $lengths by $values. Permutations are restricted so that adjacent values do not have the same sign, because this would form a longer run. This eliminates a large fraction of the possible permutations, 90% or more. If 5% of the possible number of permutations, the factorial of the length of xbase, is less than the requested sample size, the test will exhaustively check all possibilities rather than sampling. For nperm=5000 this happens at eight runs.

These tests are general and do not depend on the type of feature or spacing being analyzed. The caller, in this library the Dimodal function, is responsible for identifying features and their height and setting up the base set and draw size. NA or NaN heights will generate NA probabilities, as do 0 heights and draw sizes that are too small (minimum 3 for excursions, 2 for permutations). All base values must be finite. Bad argument values will raise errors.

If more than one height is provided the tests will evaluate them against one set of draws, unless the draw sizes also vary, in which case pairs of the arguments will be used.

A FALSE lower.tail is appropriate for peaks, TRUE for flats. The probability can have a large uncertainty if the tests generate tied values matching the height. This is mostly seen with the runs permutations, which generate discrete heights. The count uses half the ties, which is a simple form of mid-distribution correction.

The RNG seed, if not zero, is added to the draw counter and used to set up the random number generator before sampling, so that the results are repeatable. A value of zero leaves the setup alone.

The nexcur argument corresponds to option "excur.nrep" and seed to "excur.seed". Dimodal uses "excur.ntop" when generating xbase for excursions. The equivalents for the permutation test are "perm.nrep" and "perm.seed". The default number of excursions is higher than for permutations, based on an evaluation of the stability on artificial and real data. The distribution of the excursion heights is slow to settle, and has a large confidence interval; by 15,000 repetitions the median height has settled to within half a percent and the 90% confidence interval is two percent of that median. Fewer repetitions may suffice, although below 5,000 you may notice the variation in the test probabilities. The permutation test is more stable and fewer repetitions are needed, so that count has been given its own option. Internally Dimodal will adjust the excursion seed differently for each of the peak and flat tests, so that the results will be repeatable but different sequences will be used for each. The probabilities will be evaluated against "alpha.pkexcur.[lp|diw]", "alpha.runht", and "alpha.ftexcur.[lp|diw]".

Each draw runs in O(ndraw) time and memory, where ndraw is the size of the xbase set for the permutations. The permutation test takes noticeably longer with more than two symbols (ie. if there are ties) because a different sampling strategy is needed to guarantee the non-adjacency condition. Unless the package has been compiled to use the R sampling function we use the PCG random number generator from Melissa O'Neill for the excursion draws because it is much faster than the built-in sampling while offering greater bit depth.

References

https://pcg-random.org for the PCG random number generator.

Examples

Run this code

## Recommended number of excursions/permutations is larger.
## 2000 reduces the run-time.
set.seed(2) ; x <- round(runif(50,-1,1) * 10) / 10
Diexcurht.test(6.5, 30, x, 2000, TRUE, lower.tail=FALSE, seed=3)
xrun <- rle(sign(diff(x)))$values * rle(sign(diff(x)))$lengths
Dipermht.test(6.5, xrun, 2000, FALSE, 3)

Run the code above in your browser using DataLab